Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: drexelmedarchives Date: Dec 11, 2012 5:53am
Forum: texts Subject: zipped jpg directory won't derive

I'm having trouble with uploads that aren't creating derivatives. I have 3 different records/objects of the same text:

1) After some tests and correcting file naming, this one derived fine, as a sub-set of 20 files of the entire book http://archive.org/details/PhiladelphiaWorldsMedicalCenterTest

2) Uploaded the entire set of files (~160) as a new test item, using same file naming. Item never derived (15 days), has crossed out Torrent file and item shows red (waiting for admin) on the "Tasks that are done" view:
http://archive.org/details/PhiladelphiaWorldsMedicalCenter (test item)

3) Tried a subset again, of just 6 files as a new item - never derived (7 days), has crossed out Torrent file and item shows red (waiting for admin) on the "Tasks that are done" view:
http://archive.org/details/PhiladelphiaWorldsMedicalCentre

The only difference I can see in the jpgs that were successful is the sequence begins at _0001.jpg and those that haven't derived begin with _0000.jpg. Just uploaded same 6 files with sequence starting with _0001. 6 hours later and still no derivatives....

I admit to initiating another derive command and got this note: "there is already a derive.php task for this item. please wait for it to finish."

Hoping someone can offer some advice. Should I abandoned the jpgs and upload a pdf?

Thanks for ideas/suggestions/life rafts.

Reply to this post
Reply [edit]

Poster: garthus1 Date: Dec 12, 2012 4:13pm
Forum: texts Subject: Re: zipped jpg directory won't derive

This is an interesting pamphlet. I have similar issues sometime with uploads not deriving, sometimes we have to wait. Why not just make a PDF file and upload it, along with the archives.

Gerry

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Dec 13, 2012 9:58pm
Forum: texts Subject: Re: zipped jpg directory won't derive

the two items are now fixed and derived.

Reply to this post
Reply [edit]

Poster: drexelmedarchives Date: Dec 17, 2012 12:24pm
Forum: texts Subject: Re: zipped jpg directory won't derive

In uploading files of the entire book I'm again facing a derive failure for this item

http://archive.org/details/PhiladelphiaWorldsMedicalCentre

The log for the process is here:
http://www.us.archive.org/log_show.php?task_id=136224171 with a sad note of:"FATAL ERROR -- EXITING: module failed! aborting rest of derive and failing!"

My sequence was:
I deleted the existing file PhiladelphiaWorldsMedicalCentre_images.zip and uploaded a new zip file
PhiladelphiaWorldsMedicalCentre_jpg.zip

I initiated the derive and the log said it was successful but I still only had 6 derived files.

Noticing the zip file from the admin-run successful derive was named differently I deleted the PhiladelphiaWorldsMedicalCentre_jpg.zip file and uploaded a new zip file named PhiladelphiaWorldsMedicalCentre_images.zip

Derive failed.

Thanks in advance for help and suggestions for succeeding with future contributions.

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Dec 18, 2012 10:09am
Forum: texts Subject: Re: zipped jpg directory won't derive

the item has been fixed. The _images.zip needed to have Generic Raw Book Zip chosen as it's format in the Edit item page in the Files and Formats section prior to running a derive.

Reply to this post
Reply [edit]

Poster: drexelmedarchives Date: Dec 17, 2012 7:14am
Forum: texts Subject: Re: zipped jpg directory won't derive

Thank you!!!

Reply to this post
Reply [edit]

Poster: aibek Date: Feb 20, 2013 6:33pm
Forum: texts Subject: Re: zipped jpg directory won't derive

For (2) note:

(1) 20 JPG, 216 MB.
(2) 160 TIFF, 116 MB.
(3) 6 JPG, 61 MB.

Thus, there is certainly something pretty wrong with (2).

Also, for (2) and (3), the number of images according to the respective metadata files are 21 and 325 respectively. (and not 160 and 6.)

---

In general, it is a good idea to upload the images themselves. The PDF would have the images significantly compressed, so for derivative work images are better.

---

Added on 2013-02-21: ‘Compressed’ need not mean ‘degraded’. PDF files may contain losslessly compressed files too. For B/W images of text, the compression is pretty significant. I have an image at hand where uncompressed TIFF’s size is 51MB, and TIFF with Zip-Deflate compression (lossless) is only 500KB.

The size of the PDF file containing this image is 500KB too.

This post was modified by aibek on 2013-02-21 02:33:40

Reply to this post
Reply [edit]

Poster: aibek Date: Dec 12, 2012 8:41pm
Forum: texts Subject: Re: zipped jpg directory won't derive

Since the earlier derivations are still running, your updated files are not being considered. Someone needs to cancel the running derivation attempts for (2) and (3).

Reply to this post
Reply [edit]

Poster: drexelmedarchives Date: Dec 13, 2012 7:32am
Forum: texts Subject: Re: zipped jpg directory won't derive

Thanks for your responses.

I would like to have the quality of the uploaded images, if the process will eventually work.

Re: file counts, I deleted and re-uploaded files so 1) I'm not sure if the metadata aggregates the counts and 2) does the metadata count need to match the number of files - and since they don't match cause a failure?

Aibek - yes, canceling the running derivation would help! Do I just wait that out?

Thanks!!!!

Reply to this post
Reply [edit]

Poster: aibek Date: Dec 13, 2012 5:54pm
Forum: texts Subject: Requesting Jeff’s attention

Jeff:
The derivation attempt for the following two files has gone wrong because something was wrong with the uploaded files. Please stop the derivation attempts for them.

http://archive.org/details/PhiladelphiaWorldsMedicalCenter
http://archive.org/details/PhiladelphiaWorldsMedicalCentre

---
drexelmedarchives:
Actually you do not have to worry about the wrong count in metadata as it would be changed automatically -- once the system gets to your new files!

An administrator needs to cancel the derivation attempts. (It would not happen automatically.)

Reply to this post
Reply [edit]

Poster: drexelmedarchives Date: Dec 17, 2012 5:52am
Forum: texts Subject: Re: Requesting Jeff’s attention

aibek - thanks for sharing your knowledge and experience!

Jeff - thanks for stopping the derive - the sample set of image files derived fine and we're so pleased! (Love it when things work!)

I now want to replace that sample set of files with the entire book. Before I mess up something that is working, can someone confirm the process? It seems I can either:
- delete the existing zip file, add a new zip file with all images, then initiate a re-derive (would I need to delete all derivatives?)
OR
- add a new zip file with only the additional images

We so appreciate the help and support in getting this one book online!

Reply to this post
Reply [edit]

Poster: aibek Date: Dec 17, 2012 7:41am
Forum: texts Subject: Re: zipped jpg directory won't derive

Do the first. (Delete the existing file ….) You need not delete all derivatives.

This post was modified by aibek on 2012-12-17 15:41:50

Reply to this post
Reply [edit]

Poster: aibek Date: Dec 17, 2012 6:37pm
Forum: texts Subject: Re: zipped jpg directory won't derive

Actually, you do need to delete all derivatives. Sorry.

So, just leave the _meta.xml in place, and delete everything else. (_meta.xml file has the title, author, description, etc saved in it.)

Reply to this post
Reply [edit]

Poster: drexelmedarchives Date: Dec 18, 2012 5:28am
Forum: texts Subject: Re: zipped jpg directory won't derive

Someone stepped in for me. I didn't go in and delete more files but this is now derived and looking good.

The only real issue is I'm not confident I can upload new items from image files as I am not clear what went wrong and what to do differently next time around. If I were clear I would write out clear documentation and share it.

One confusing point is the naming of the zip file - I originally named it item-file-name_jpg.zip (based on documentation in the Text FAQ) and then changed that to item-file-name_images.zip, based on what I saw in the file list after Jeff successfully re-initiated a derive.

Meanwhile, thanks aibek and whoever else helped out here.

Reply to this post
Reply [edit]

Poster: aibek Date: Dec 18, 2012 9:06am
Forum: texts Subject: Re: zipped jpg directory won't derive

The cause of the latest failure, at least, is clear. The log file you linked to shows that the earlier scandata.xml file was being used together with your new _images.zip file, which caused problems. This latest problem can be solved by deleting all the derived files (as I said later).

Why upload with _jpg.zip is failing is more involved. Could it be that something is wrong with your _0000.jpg file? When you upload the files with the name _images.zip, the images files are first converted into jp2 format, and OCR is run on the jp2 files. When the files are uploaded with the name _jpg.zip, OCR is run over the jpg files directly. I can imagine one point where the discrepancy can creep in: your 0000.jpg file has some problem, so OCR on it fails, but the jpg-to-jp2 converter is “forgiving”, so it eats up the bad jpg file to output good jp2, on which the OCR processor runs without complaining. Can you upload your 0000.jpg and 0001.jpg files here so that I can check (on the forum, “Add files” while replying)?

At any rate, you know what to do! Upload your files as _images.zip!

To summarize: Name all your files as *_0000.jpg, *0001.jpg and so on (or *_0000.tif, *_0001.tif, and so on), collect them all in *_images.zip, and upload this file. And, if you are changing this file, delete all the derivatives first.

(By the way, if your scanner produces images in tif, you should upload the tifs. Jpg/Jpeg is a “lossy compression” format -- conversion from tif to jpg decreases the image quality. Tif/Tiff and Bmp/bitmap are “lossless compression” formats.)

Reply to this post
Reply [edit]

Poster: drexelmedarchives Date: Dec 18, 2012 10:25am
Forum: texts Subject: Re: zipped jpg directory won't derive

Well, in working with several 30-day test objects I will tell you I did have one _jpg.zip file work, with just a few images. But no success subsequently (until today, with another test file).

Jeff responded to my other post and hence fixed the problem, with this comment - "the item has been fixed. The _images.zip needed to have Generic Raw Book Zip chosen as it's format in the Edit item page in the Files and Formats section prior to running a derive." So now I know to look for that, too.

Thanks for your details on the process (did not stick with the error log long enough to pull apart all that was going on). I submitted this with the first 2 jpg files and it seems neither the post nor the files uploaded so I'm abandoning the attached files - as you say, I'll stick with *_images.zip for uploads. Presumably this will be fine if we switch back to uploading tiffs (had started with tiffs but had to tar it and then started with various problems, so wasn't sure if the tiffs or the tar file were problematic, so bumped down to jpgs).

Whew. It's great when it works, though! Many thanks!!